Automatic Differentiation
Tensorial differentiates scalar, vector, tensor, and packed-state functions in the tensor spaces of their arguments. For most code, start with gradient and hessian. The callable operator ∂ is the general form for higher derivatives.
Gradients and hessians
For a scalar-valued function of a vector input, gradient returns a vector and hessian returns a second-order tensor:
julia> a = @Vec [3.0, 4.0]2-element Vec{2, Float64}: 3.0 4.0julia> φ(a) = 0.5 * (a ⊡ a)φ (generic function with 1 method)julia> gradient(φ, a)2-element Vec{2, Float64}: 3.0 4.0julia> hessian(φ, a)2×2 SymmetricSecondOrderTensor{2, Float64, 3}: 1.0 0.0 0.0 1.0
When an argument is a symmetric tensor, AD is performed on that symmetric tensor space:
julia> ε = symmetric(@Mat [0.02 0.01 0.0; 0.01 0.00 0.0; 0.0 0.0 -0.01])3×3 SymmetricSecondOrderTensor{3, Float64, 6}: 0.02 0.01 0.0 0.01 0.0 0.0 0.0 0.0 -0.01julia> K = 10.0; # bulk modulusjulia> G = 5.0; # shear modulusjulia> ψ(ε) = K/2 * tr(ε)^2 + G * (dev(ε) ⊡₂ dev(ε))ψ (generic function with 1 method)julia> σ = gradient(ψ, ε);julia> σ isa SymmetricSecondOrderTensor{3}truejulia> ℂ = hessian(ψ, ε);julia> ℂ isa SymmetricFourthOrderTensor{3}true
The tensor space of the argument matters. If a value is symmetric only numerically, but its type is a general matrix tensor, the derivative is taken in the general matrix space:
julia> A = rand(Mat{3,3})3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}: 0.325977 0.894245 0.953125 0.549051 0.353112 0.795547 0.218587 0.394255 0.49425julia> S = A * A'; # symmetric in value, but not typed as a symmetric tensor3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}: 1.81438 1.253 0.894897 1.253 1.05904 0.65243 0.894897 0.65243 0.4475julia> gradient(identity, S) ≈ one(FourthOrderTensor{3})truejulia> gradient(identity, symmetric(S)) ≈ one(SymmetricFourthOrderTensor{3})true
For more on symmetric tensor spaces, see Tensor Types and Spaces.
Returning the function value
Pass :all when you want derivatives and the function value from one call. For gradient, the return value is
(gradient(f, x), f(x))and for hessian, it is
(hessian(f, x), gradient(f, x), f(x))julia> gradient(φ, a, :all)([3.0, 4.0], 12.5)julia> hessian(φ, a, :all)([1.0 0.0; 0.0 1.0], [3.0, 4.0], 12.5)
For the general operator ∂{N}, :all returns derivatives from higher to lower order, ending with the function value:
(∂{N}(f, x), ..., ∂{2}(f, x), ∂(f, x), f(x))The general operator ∂
For scalar input and scalar output, ∂ is the most direct spelling. ∂(f, args...) is equivalent to ∂{1}(f, args...). Use braces for higher orders:
julia> f(x) = x^4 + xf (generic function with 1 method)julia> x = 2.02.0julia> f(x)18.0julia> ∂(f, x)33.0julia> ∂{2}(f, x)48.0julia> ∂{2}(f, x, :all)(48.0, 33.0, 18.0)julia> ∂{3}(x -> x^5, x)240.0
The general operator is useful when the derivative order is part of the formula you want to write down. gradient(f, x) is ∂{1}(f, x), and hessian(f, x) is ∂{2}(f, x).
Multiple inputs
For multiple inputs, the first derivative is returned as a tuple whose entries follow the order of the inputs:
julia> gradient((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0)(16.0, 14.0)julia> gradient((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0, :all)((16.0, 14.0), 44.0)
The result represents:
(∂f/∂x, ∂f/∂y)With :all, it returns:
((∂f/∂x, ∂f/∂y), f(x, y))Second derivatives for multiple inputs are returned as a block Hessian:
julia> hessian((x, y) -> x^2 + x*y + y^3, 2.0, 3.0)((2.0, 1.0), (1.0, 18.0))julia> hessian((x, y) -> x^2 + x*y + y^3, 2.0, 3.0, :all)(((2.0, 1.0), (1.0, 18.0)), (7.0, 29.0), 37.0)
The Hessian block structure is
(
(∂²f/∂x², ∂²f/∂x∂y),
(∂²f/∂y∂x, ∂²f/∂y²),
)The same block structure is used when the input types differ:
julia> A = symmetric(@Mat [1.0 0.2; 0.2 2.0])2×2 SymmetricSecondOrderTensor{2, Float64, 3}: 1.0 0.2 0.2 2.0julia> d = gradient((x, A) -> x * tr(A), x, A)(3.0, [2.0 0.0; 0.0 2.0])julia> d[1]3.0julia> d[2] isa SymmetricSecondOrderTensor{2}truejulia> H = hessian((x, A) -> x * tr(A), x, A);julia> H[1][2] isa SymmetricSecondOrderTensor{2}truejulia> H[2][2] isa SymmetricFourthOrderTensor{2}true
Multiple outputs
If f returns a tuple, each output is differentiated separately. The outer tuple follows the outputs:
julia> gradient(x -> (x^2, x^3), 2.0)(4.0, 12.0)julia> gradient(x -> (x^2, x^3), 2.0, :all)((4.0, 12.0), (4.0, 8.0))
The result represents:
(∂f₁/∂x, ∂f₂/∂x)With :all, it returns:
((∂f₁/∂x, ∂f₂/∂x), (f₁(x), f₂(x)))Second derivatives are handled in the same way:
julia> hessian(x -> (x^2, x^3), 2.0)(2.0, 12.0)julia> hessian(x -> (x^2, x^3), 2.0, :all)((2.0, 12.0), (4.0, 12.0), (4.0, 8.0))
The result represents:
(∂²f₁/∂x², ∂²f₂/∂x²)and :all returns
(
(∂²f₁/∂x², ∂²f₂/∂x²),
(∂f₁/∂x, ∂f₂/∂x),
(f₁(x), f₂(x)),
)Multiple inputs and multiple outputs
When there are both multiple inputs and multiple outputs, the outer tuple follows the outputs, and the inner tuple follows the inputs:
julia> gradient((x, y) -> (x + y, x * y), 2.0, 3.0)((1.0, 1.0), (3.0, 2.0))julia> gradient((x, y) -> (x + y, x * y), 2.0, 3.0, :all)(((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))
The result represents:
(
(∂f₁/∂x, ∂f₁/∂y),
(∂f₂/∂x, ∂f₂/∂y),
)For second derivatives, each output carries its own block Hessian:
julia> hessian((x, y) -> (x + y, x * y), 2.0, 3.0)(((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0)))julia> hessian((x, y) -> (x + y, x * y), 2.0, 3.0, :all)((((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0))), ((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))
The result represents:
(
(
(∂²f₁/∂x², ∂²f₁/∂x∂y),
(∂²f₁/∂y∂x, ∂²f₁/∂y²),
),
(
(∂²f₂/∂x², ∂²f₂/∂x∂y),
(∂²f₂/∂y∂x, ∂²f₂/∂y²),
),
)With :all, the return value is
(
second_derivatives,
first_derivatives,
function_value,
)where second_derivatives has the block-Hessian structure shown above and first_derivatives has the output/input structure of gradient.
For automatic-differentiation docstrings, see Automatic differentiation API.